Unimodal Bandits without Smoothness

نویسندگان

  • Richard Combes
  • Alexandre Proutière
چکیده

We consider stochastic bandit problems with a continuum set of arms and where the expected re-ward is a continuous and unimodal function of the arm. No further assumption is made regarding thesmoothness and the structure of the expected reward function. We propose Stochastic Pentachotomy(SP), an algorithm for which we derive finite-time regret upper bounds. In particular, we show that, forany expected reward function μ that behaves as μ(x) = μ(x)− C|x− x| locally around its maxi-mizer x for some ξ, C > 0, the SP algorithm is order-optimal, i.e., its regret scales asO(√T log(T ))when the time horizon T grows large. This regret scaling is achieved without the knowledge of ξ andC. Our algorithm is based on asymptotically optimal sequential statistical tests used to successivelyprune an interval that contains the best arm with high probability. To our knowledge, the SP algo-rithm constitutes the first sequential arm selection rule that achieves a regret scaling as O(√T ) up toa logarithmic factor for non-smooth expected reward functions, as well as for smooth functions withunknown smoothness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope, 2009; Yu & Mannor, 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which ca...

متن کامل

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope, 2009; Yu & Mannor, 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which ca...

متن کامل

Unimodal Bandits

We consider multiarmed bandit problems where the expected reward is unimodal over partially ordered arms. In particular, the arms may belong to a continuous interval or correspond to vertices in a graph, where the graph structure represents similarity in rewards. The unimodality assumption has an important advantage: we can determine if a given arm is optimal by sampling the possible directions...

متن کامل

Consistency of the Maximum Product of Spacings Method and Estimation of a Unimodal Distribution

The first part of this paper gives some general consistency theorems for the maximum product of spacings (MPS) method, an estimation method related to maximum likelihood. The second part deals with nonparametric estimation of a concave (convex) distribution and more generally a unimodal distribution, without smoothness assumptions on the densities. The MPS estimator for a distribution function ...

متن کامل

Verification Based Solution for Structured MAB Problems

We consider the problem of finding the best arm in a stochastic Multi-armed Bandit (MAB) game and propose a general framework based on verification that applies to multiple well-motivated generalizations of the classic MAB problem. In these generalizations, additional structure is known in advance, causing the task of verifying the optimality of a candidate to be easier than discovering the bes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1406.7447  شماره 

صفحات  -

تاریخ انتشار 2014